Predicting end of utterance in multimodal and unimodal conditions
نویسندگان
چکیده
In this paper, we describe a series of perception studies on uniand multimodal cues to end of utterance. Stimuli were fragments taken from a recorded interview session, consisting of the parts in which speakers provided answers. The answers varied in length and were presented without the preceding question of the interviewer. The subjects had to predict when the speaker would finish his turn, based on video material and/or auditory material. The experiment consisted of 3 conditions: in one condition, the stimuli were presented as they were recorded (both audio and vision), in the two remaining conditions stimuli were presented in only the auditory or the visual channel. Results show that the audiovisual condition evoked the fastest reaction times and the visual condition the slowest. Arguably, the combination of cues from different modalities function as complementary sources and might thus improve prediction.
منابع مشابه
The interplay between the auditory and visual modality for end-of-utterance detection.
The existence of auditory cues such as intonation, rhythm, and pausing that facilitate end-of-utterance detection is by now well established. It has been argued repeatedly that speakers may also employ visual cues to indicate that they are at the end of their utterance. This raises at least two questions, which are addressed in the current paper. First, which modalities do speakers use for sign...
متن کاملUsing Weighted Distributions for Modeling Skewed, Multimodal and Truncated Data
When the observations reflect a multimodal, asymmetric or truncated construction or a combination of them, using usual unimodal and symmetric distributions leads to misleading results. Therefore, distributions with ability of modeling skewness, multimodality and truncation have been in the core of interest in statistical literature, always. There are different methods to contract ...
متن کاملCombining User Modeling and Machine Learning to Predict Users' Multimodal Integration Patterns
Temporal as well as semantic constraints on fusion are at the heart of multimodal system processing. The goal of the present work is to develop useradaptive temporal thresholds with improved performance characteristics over state-of-the-art fixed ones, which can be accomplished by leveraging both empirical user modeling and machine learning techniques to handle the large individual differences ...
متن کاملSemi-supervised Multimodal Learning with Deep Generative Models
In recent years, deep neural networks are used mainly as discriminators of multimodal learning. We should have large amounts of labeled data for training them, but obtaining such data is difficult because it requires much labor to label inputs. Therefore, semi-supervised learning, which improves the discriminator performance using unlabeled data, is important. Among semi-supervised learning, me...
متن کاملIntegration of iconic gestures and speech in left superior temporal areas boosts speech comprehension under adverse listening conditions
Iconic gestures are spontaneous hand movements that illustrate certain contents of speech and, as such, are an important part of face-to-face communication. This experiment targets the brain bases of how iconic gestures and speech are integrated during comprehension. Areas of integration were identified on the basis of two classic properties of multimodal integration, bimodal enhancement and in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005